Looking for Transliterations in a Trilingual English, French and Japanese Specialised Comparable Corpus
نویسندگان
چکیده
Transliterations and cognates have been shown to be useful in the case of bilingual extraction from parallel corpora. Observation of transliterations in a trilingual English, French and Japanese specialised comparable corpus reveals evidences that they are likely to be used with comparable corpora too, since they are an important and relevant part of the common vocabulary, but they also yield links between Japanese and English/French corpora.
منابع مشابه
Extracting Transliteration Pairs from Comparable Corpora
Transliterating words and names from one language to another is a frequent and highly productive phenomenon. For example, English word cache is transliterated in Japanese asキャッシュ “kyasshu”. In many cases, recent transliterations are not recorded in machine readable dictionaries so it is impossible to rely on dictionary lookup to find transliteration equivalents. In this paper we describe a meth...
متن کاملToward a Comparable Corpus of Latvian, Russian and English Tweets
Twitter has become a rich source for linguistic data. Here, a possibility of building a trilingual Latvian-Russian-English corpus of tweets from Riga, Latvia is investigated. Such a corpus, once constructed, might be of great use for multiple purposes including training machine translation models, examining cross-lingual phenomena and studying the population of Riga. This pilot study shows that...
متن کامل(Utilisation de la similarité sémantique pour l'extraction de lexiques bilingues à partir de corpus comparables) [in French]
This paper presents a new method that aims to improve the results of the standard approach used for bilingual lexicon extraction from specialized comparable corpora. We attempt to solve the problem of context vector word polysemy. Instead of using all the entries of the dictionary to translate a context vector, we only use the words of the lexicon that are more likely to give the best character...
متن کاملMINT: A Method for Effective and Scalable Mining of Named Entity Transliterations from Large Comparable Corpora
In this paper, we address the problem of mining transliterations of Named Entities (NEs) from large comparable corpora. We leverage the empirical fact that multilingual news articles with similar news content are rich in Named Entity Transliteration Equivalents (NETEs). Our mining algorithm, MINT, uses a cross-language document similarity model to align multilingual news articles and then mines...
متن کاملAcquisition of Medical Terminology for Ukrainian from Parallel Corpora and Wikipedia
The increasing availability of parallel bilingual corpora and of automatic methods and tools for their processing makes it possible to build linguistic and terminological resources for low-resourced languages. We propose to exploit various corpora available in several languages in order to build bilingual and trilingual terminologies. Typically, terminology information extracted in French and E...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009